Profiling expression strategies for a type III polyketide synthase in a lysate-based, cell-free system

Some of the most metabolically diverse species of bacteria (e.g., Actinobacteria) have higher GC content in their DNA, differ substantially in codon usage, and have distinct protein folding environments compared to tractable expression hosts like Escherichia coli. Consequentially, expressing biosynthetic gene clusters (BGCs) from these bacteria in E. coli often results in a myriad of unpredictable issues with regard to protein expression and folding, delaying the biochemical characterization of new natural products. Current strategies to achieve soluble, active expression of these enzymes in tractable hosts can be a lengthy trial-and-error process. Cell-free expression (CFE) has emerged as a valuable expression platform as a testbed for rapid prototyping expression parameters. Here, we use a type III polyketide synthase from Streptomyces griseus, RppA, which catalyzes the formation of the red pigment flaviolin, as a reporter to investigate BGC refactoring techniques. We applied a library of constructs with different combinations of promoters and rppA coding sequences to investigate the synergies between promoter and codon usage. Subsequently, we assess the utility of cell-free systems for prototyping these refactoring tactics prior to their implementation in cells. Overall, codon harmonization improves natural product synthesis more than traditional codon optimization across cell-free and cellular environments. More importantly, the choice of coding sequences and promoters impact protein expression synergistically, which should be considered for future efforts to use CFE for high-yield protein expression. The promoter strategy when applied to RppA was not completely correlated with that observed with GFP, indicating that different promoter strategies should be applied for different proteins. In vivo experiments suggest that there is correlation, but not complete alignment between expressing in cell free and in vivo. Refactoring promoters and/or coding sequences via CFE can be a valuable strategy to rapidly screen for catalytically functional production of enzymes from BCGs, which advances CFE as a tool for natural product research.

Microbial secondary metabolism generates a vast number of complex secondary metabolites, or natural products, with varied chemistry 1 .Members of the order Actinomycetales, and especially those from the genus Streptomyces, have been shown to be a valuable lineage in terms of their capacity for secondary metabolism [2][3][4][5] .Indeed, Streptomyces genomes are known for harboring a large number of biosynthetic gene clusters (BGCs) that are attractive for the discovery of novel enzymes and their chemical products.Unfortunately, these positive aspects of Streptomyces spp.are offset by their slow doubling time, mycelial clumping, thick cell walls, high GC content (~ 70-75%), and relatively cumbersome genetics.Consequently, biochemists who study Streptomyces spp.-derived natural products often spend substantial time optimizing soluble, appropriately folded functional expression, presenting a bottleneck to enzymatic characterization.As a result, most efforts for expressing proteins of interest as soluble, functional constructs are done in tractable hosts like E. coli, rather than Streptomyces spp.Because of the deep evolutionary divergence between Actinobacteria and Proteobacteria such as E. coli, the differences in their metabolic backgrounds, dissimilar codon usage and genome attributes, as well as protein folding environments, expression of heterologous genes from Streptomyces spp.and, in turn, product synthesis often fails 6 .Where BGC expression in E. coli is possible, a myriad of parameters usually require optimization.Additionally, product toxicity can impede the use of E. coli strains without engineering host tolerance.Finding suitable recombinant expression systems, therefore, involves screening of refactoring choices that include choice of regulatory elements like promoters, and coding strategy.This process can be time-consuming and laborious, a significant bottleneck to researcher workflows.
Cell-free expression (CFE) platforms employ either crude cell lysates (or extracts) or an in vitro transcription and translation (TX-TL) PURE system and can bypass limitations in secondary metabolite production observed in vivo expression systems 7,8 .The TX-TL PURE system or PURExpress employs the minimal number of recombinant elements required for transcription and translation while approaches using crude cell lysates are derived from intact living cells.Harnessing the TX-TL machinery preserved in lysates allows protein expression in the absence of other normal cellular functions, a feature that can be leveraged to manufacture enzymes that are difficult to synthesize in microbial hosts.Lysates also retain metabolic pathways that can be engineered to accumulate precursor molecules for heterologous biosynthetic enzymes [9][10][11] .CFE systems thus present an emerging alternative approach for synthesizing BGCs and their product metabolites, especially when cytotoxicity represents a limiting factor to heterologous in vivo production [12][13][14] .Additionally, CFE systems can be used to optimize the cell-based expression of soluble BGCs when leveraged as testbeds for genetic refactorization.Prototyping different genetic constructs in these platforms is relatively rapid as it bypasses time limitations associated with culturing and genetically manipulating live cells.To these ends, investigating refactoring strategies in a cell-free environment can benefit the development of cell-free and cell-based BGC expression platforms.
While optimizations of protein expression via refactoring have usually focused on robustly expressed reporters (e.g., sfGFP) [15][16][17][18][19][20] , we sought to evaluate common refactoring parameters using a reporter that was relevant to the enzymatic activity of genes involved in secondary metabolite formation 21 .To further explore the ability to refactor for functional catalytic activity, we chose a model protein, RppA (40.1 kDa), that generates flaviolin, which is a red pigment that has limited catalytically functional expression in E. coli even with current optimization [22][23][24] .
RppA is a type III polyketide synthase from Streptomyces griseus.As a type III PKS (as opposed to the type I and type II PKSs), it condenses free malonyl CoA directly as opposed to covalently transferring it to a carrier protein, thus no phosphopantetheinyl transferase is required 25 for activation to a holo protein.Applying flaviolin production as a reporter for catalytically functional RppA expression, we varied parameters that are relevant to improving catalytically functional expression.This included the use of three different, commonly used inducible promoters: the workhorse T7/lac system, the pBAD arabinose promoter 26 , and the pTet anhydrotetracycline promoter 27 .In addition to varying promoters, we also evaluated the impact of four different methods for designing synonymous coding sequences for generating higher levels of heterologous gene expression.Notably, while used extensively in vivo, inducible promoters beyond the T7 system have not been extensively investigated in CFE.We also demonstrate strong positive correlations between cell-free and cell-based expression and discuss the feasibility of this approach for refactoring challenging proteins in vitro prior to in vivo production.Taken together, this work demonstrates a coordinated strategy to apply a lysate-based cell-free environment for profiling genetic constructs that promote catalytically active enzyme formation.The results are applicable to both cell-free and cell-based systems and can be used to generate biosynthetic proteins for characterizing elements of engineered biosynthetic pathways (Fig. 1).

Figure 1.
Overall schematic for monitoring and prototyping RppA activity.

Design of the vector system
To design a library of constructs to improve expression, we focused on two commonly varied refactoring parameters: promoter choice and codon usage.The choice of these two key parameters were rationalized because these are two of the most common alterations when trying to troubleshoot soluble expression of high GC in E. coli 21,28 .We used four distinct coding sequences and four distinct promoters (Fig. 2A).In terms of promoter choice, while the T7-lac promoter 28 combination is the basis of the most commonly used expression system in E. coli (particularly the pET expression system) 29,30 other promoters that are not strong as the T7 promoter nor as leaky as the ITPG inducible lac operon are sometimes used to promote soluble expression of challenging to express proteins in vivo 28 .Other commonly used vectors include the pTet promoter, which is anhydrotetracycline inducible 28 , and the pBAD promoter that is arabinose inducible 26 both of which are less leaky than the lac operon (Fig. 2B).To obtain these constructs, we used a series of BioBrick vectors designed by Keasling and coworkers 28 that included vector pBbE2k (harboring the pTet promoter), and pBbE8k (harboring the pBAD promoter), and pBbE7k (harboring the T7 promoter under control of the lac operon).
To complement this series of promoters, we varied synonymous coding sequences.When expressing proteins from a high GC bacterium that differs substantially from E. coli in terms of codon usage, there are several approaches that can be taken.While sometimes the natural coding sequence results in successful protein expression in E. coli, protein expression and folding issues can occur.These issues with expression and poor folding/ solubility can originate from a mismatch between tRNA pools typical of each organism, which can change the rate of translation (e.g., cause stalls at the ribosome) and potentially disrupt appropriate co-translational folding [31][32][33][34] .Codon optimization is a common strategy to alter codon assignments, and appropriate algorithms are readily accessible from most commercial gene synthesis companies via replacement by codons used more frequently in the host's genome or transcriptome [35][36][37][38][39][40][41] .While these genome and transcriptome-optimized sequences can aid in the successful expression of the heterologous product, improperly folded products can still result 42,43 .Another approach is codon "harmonization, " which has been posited to improve expression via better co-translational folding 44 .Codon harmonization involves identifying and replicating patterns of codon usage in the donor organism with comparable patterns of codon usage in the heterologous host 45,46 .Typically, synonymous codons (or sliding windows of these codons) are assigned computational estimates of their frequency of appearance in the original host organism.Next, single codons are changed based on frequencies estimated in the desired heterologous host to better replicate the source organism's frequency patterns, which enables stalling patterns at the ribosome that is more akin to how they originally evolved and therefore might result in proper protein folding 47 .
To explore the effect of synonymous coding, four constructs were compared.The native coding sequence amplified directly from Streptomyces griseus genomic DNA, routine codon optimization was performed by Integrated DNA Technology's (IDT) codon optimization algorithm, and finally, two codon harmonization constructs were designed, the first using the CHARMING (for Codon HARMonizING) (HC-rppA) 43 and a new method based on ribosome overhead costs, Stochastic Evolutionary Model of Protein Production Rate (ROC-SEMPPR) 48 (HR-rppA) (Fig. 2C).Briefly, the CHARMING method applies a relative measure of codon usage called "% Min-Max" (%MM) 43 .%MM values are computed as described by Chaney et al. 34,49,50 using overall codon usage from an organism obtainable from various sources including codon usage information tabulated in the international DNA sequence database (Kazusa: https:// www.kazusa.or.jp/ codon/) 51 .CHARMING uses a sliding window to estimate %MM-based deviations between the original and target organism codon usage for a given protein.While large deviations exist within one or more windows, single synonymous changes are made that best "harmonize" the values, i.e., reduce the overall %MM difference in that specific window.In short, this algorithm will minimize the sum of | MM_original -MM_target | over all windows and will proceed until five consecutive iterations where no beneficial, i.e., reduces differences between the %MM values, changes are found.The size of the windows used was the CHARMING default value, which was set to be most consistent with ribosome fingerprint-based pausing estimates 43 .In contrast, ROC-SEMPPR takes an evolutionary approach to estimating the translational efficiency of an amino acid's synonymous codons within a given organism.ROC-SEMPPR does so by fitting a probabilistic, population genetics-based model of sequence evolution, which includes the contributions of selection, mutation bias, and genetic drift, to an organism's coding sequences 48,52 .By simultaneously analyzing intragenic and intergenic patterns of synonymous codon usage within a genome, ROC-SEMPPR uses estimates differences in ribosome pausing times among synonymous codons translational efficiency mutation bias between codons, and differences in protein production rates between genes.Fitting ROC-SEMPPR separately to the donor (in this case Streptomyces griseus) and host (in this case E. coli) genomes enables the ability to rank each amino acid's synonymous codons by their translational efficiencies within the donor and host, respectively.
The promoter and coding sequence combinations represent a total of 16 different constructs.The naming convention for the components of the 16 constructs is detailed in Fig. 2.

Initial cell-free experiments for a type III PKS enable the production of flaviolin.
RppA catalyzes polyketide synthesis by condensing and cyclizing five molecules of malonyl-CoA, resulting the pentaketide tetrahydroxynapthalene (THN) 22,23 .Subsequently, THN undergoes a spontaneous oxidation reaction to convert THN to flaviolin (Fig. 3A).The formation of the red-brown flaviolin pigment can thus be monitored as it readily absorbs light at 340 nm above cellular background 24,53 .As metabolite production in an E. coli lysate-based cell-free system is correlated with the amount of protein that is expressed 54 , we used the amount of flaviolin produced as a proxy for estimating catalytically functional RppA production.To establish assay conditions, we first sought to determine the threshold for pigment production against a cell lysate background.To do so, we spiked purified flaviolin into lysate preparations in the absence of DNA.Sufficient pigment concentrations could be detected at the micromolar range, demonstrating sufficient sensitivity to proceed (Fig. S1).With an assay established, pigment production could be tested at variable temperature conditions and plasmid DNA concentrations.
To define temperature conditions for conducting CFE experiments, we used the pET28b expression plasmid and codon-optimized rppA (O-rppA) in a lysate-based system using E. coli BL21 Star(DE3), a BL21(DE3) strain that has the DE3 lysogen under control of a lac-UV5 promoter 55 .We have previously had success using this lysate for the production of pigments from Streptomyces 54 .Our rationale for this initial choice of coding sequence and promoter was twofold: (1) prior studies from our laboratory suggest that heterologous expression of enzymes originating from Streptomyces that form pigments show improved expression and solubility when using E. coli optimized sequences 33 and (2) the T7 promoters (and specifically pET vectors) have been widely used in CFE 54,[56][57][58][59][60] .These initial experiments revealed superior pigment production at 30 °C, so we proceeded with 30 °C for all subsequent experiments (Fig. S2).To remove confounding variables from differing intergenic regions and a different origin of replication in the pET vector as compared to the BioBrick vectors, we repeated this experiment using pBbE7k as the backbone for consistency with the pBAD and pTet constructs.Increasing the amount of the plasmid construct pBbE7k-O-rppA (containing E. coli codon-optimized rppA driven by pT7 promoter) with E. coli BL21 Star(DE3) lysate containing endogenous IPTG at 30 °C demonstrated that the DNA template concentration affects protein expression and therefore product formation (Fig. 3B).We found that flaviolin signal reached its maximal value after approximately 2-3 h despite the usage of a modified PANOx-SP system 61 which typically prolongs protein synthesis up to ~ 10 h.We are unsure of the major limiting factor resulting in flaviolin production which could related to enzymatic activity or substrate concentration.This also illustrates the complexity of monitoring an enzymatic reporter (as opposed to a reporter for protein synthesis such as sfGFP) as reaction kinetics, substrate turnover, and degradation can all come into play when monitoring output.

Establishing the application of non-IPTG inducers for cell-free with other promoters
The ITPG inducible T7-lac expression system, especially the pET expression system is by far the most heavily used expression system for heterologous recombinant protein production in E. coli 61 .However, its extreme promoter strength, combined with the leakiness of the lac operon, adds to the metabolic burden of the cell resulting in decreased fitness less than optimal protein expression.For proteins that seem to be better expressed and appropriately folded with tighter transcriptional control, other inducible promoter systems have been developed, such as the arabinose inducible pBAD and anhydrotetracycline inducible pTet systems.Indeed, there is precedence for alternate promoter systems improving the expression of proteins from Streptomyces that don't express under standard expression conditions (e.g.commercial pET vectors with a T7-lac system).For example, proteins from the borrelidin polyketide synthase from Streptomyces parvulus were found to have improved expression using the pTet promoter when compared to the T7 system 59,62,63 .Extensive efforts have been made to apply the T7 promoter series under the lac operator in CFE 56,64 .However, other inducible systems remain extremely underexplored, with only a few reports of their usage in lysate-based CFE systems [65][66][67] .
To compare the performances of each promoter in a lysate-based system, extracts were first prepared from E. coli BL21 Star(DE3) in the absence of IPTG.While IPTG is typically added to BL21 Star(DE3) cultures to promote the expression of T7 RNA polymerase prior to cell lysis, we omitted this step to prepare a lysate background that is appropriate for the comparison of non-IPTG inducible promoters.Using this batch of lysate, reactions were then first optimized to express RppA under pT7 and with the supplementation of IPTG to the lysate reaction.IPTG concentrations were supplemented to reactions containing 50 ng/µL T7 RNA polymerase 68 .Maximal flaviolin production was observed with a concentration of 500 µM IPTG (Fig. S3A,C).The same lysate preparations were then used for protein expression driven by the pTet and pBAD promoters under different ranges of anhydrotetracycline and l-arabinose, respectively.Initial efforts did not result in detectable flaviolin production.We hypothesized that flaviolin signals are not detectable under these conditions due to lower soluble protein expression levels (further supported by Western blot, Figs.S9, S11), and, consequently, less flux being driven from endogenous malonyl-CoA precursor pools to flaviolin formation.
Intriguingly, when optimized inducer concentrations are applied, overall flaviolin production is comparable between the pTet and pT7 conditions, even though synthesis is clearly delayed in the former system.Faster expression in the pT7 system could be due to promoter strength or the availability of exogenously supplied T7 RNA polymerase, whereas the other promoters rely on low levels of endogenous E. coli RNA polymerase.To confirm that fast expression from pT7 is a result of this promoter's strength, we first supplied reactions with decreasing concentrations of T7 RNA polymerase.While overall flaviolin production decreases with lower polymerase levels, flaviolin synthesis still begins within the first hour under all conditions.These data imply that fast expression under pT7 is due to this system being less tightly regulated compared to pTet and pBAD and not necessarily the availability of the polymerase (Fig. S6).To further interrogate this phenomenon, we tested our promoter strategy with sfGFP, allowing us to distinguish protein expression from enzyme catalysis or precursor availability.As an initial step, we verified that inducer concentrations performed best for RppA expression were also performed best for inducing sfGFP expression under the control of different promoters in CFE reactions (Fig. S7A-C).We subsequently compared sfGFP synthesis from these inducible promoters and a constitutive promoter, pJ23101, in an analogous biobrick vector (pBbEJk).Evidently, there is a delay in the expression of sfGFP under the control of either pTet, pBAD, and pJ23101 promoters compared to pT7 (Fig. S7D).While the constitutive promoter produced relatively low levels of sfGFP, it did result in sfGFP expression over a faster timeframe compared to pTet and pBAD (Fig. S7D & E).Thus, delayed CFE from pTet and pBAD is likely a result of their tighter regulation compared to pT7.

Synergistic effect of promoter and coding strategy in refactoring proteins for RppA CFE
After establishing a set of experimental conditions for detectable flaviolin formation driven by non-T7 inducible promoters, we sought to determine the synergistic effect of promoter plus coding strategy in refactoring a protein for optimized expression in our cell-free system.Overall, we found the strongest promoter-coding strategy to be the pBAD promoter using the ROC-SEMPPR method (HR-rppA) which is slightly higher than the CHARMING method (HC-rppA) driven by the pTet promoter and has significantly higher expression than all other constructs (Fig. 4A-C).Whether or not ROC-SEMPPR will perform similarly well in other situations remains to be determined.It does, however, suggest that simply ranking codons based on their occurrence in a genome (which ignores the role of mutation bias, variation in expression between genes) or transcriptome (which ignores the role of mutation bias and the limits drift places on adaptation), while useful, can be improved upon which was the original rationale for developing the ROC-SEMPRR model.In addition, we observed that expression with the pT7 promoter is drastically increased when using the lysate that contained endogenous IPTG compared to lysate with exogenous IPTG (Fig. S6).Interestingly, with the pT7 promoter, the HC construct produced the least amount of flaviolin, even lower than the natively coded sequence cloned from genomic DNA.The CHARMING method used to generate the HC construct uses sliding windows to estimate local rates of translation across the coding sequence, e.g., to best facilitate natural ribosomal stalling (see Methods) 43 ; however, because few rare codons are observed in the native rppA coding sequence, it appears that this method is not necessary to promote functional protein production using a pT7 promoter (Fig. 4A).The trend of each coding sequence considered is different for each promoter.For the pTet promoter, HC-rppA has the best expression, then HR-rppA is the second best, while the native construct (N-rppA) does not express at all (Fig. 4B).Similarly, with the pBAD promoter, the native construct has low expression while HR-rppA has the best expression followed by HC-rppA (Fig. 4C).Due to the delay in expression observed while optimizing the induction of pTet and pBAD (Figs.S4, and S5), we monitored RppA production for a longer time period compared to the T7 constructs.Because these codon harmonization models do not account for inducible expression, we sought to determine the effects of these re-coding strategies on flaviolin production when RppA is expressed under pJ23101, allowing us to decouple expression from induction.In this case, HR-rppA expressed drastically better than other constructs, whereas, in contrast, the O-rppA did not express at all (Fig. 4D).Importantly, these experiments show that the choice of codon optimization/harmonization techniques and promoter both impact protein expression synergistically, which is an important consideration for future efforts to use CFE for high-yield protein expression.The level of protein expressed is correlated with the amount of flaviolin detected and can be visualized by western blot analysis of pooled lysate reactions (Figs.S9, S11).Additionally, unlike RppA, sfGFP expressed best under pT7 control while the constitutive promoter does not express well in these CFE conditions (Fig. S7).Thus, these results also confirm that different proteins have varying optimal CFE expression conditions 54 .

Investigating the utility of CFE for prototyping refactoring techniques for in vivo production
Expanding strategies for profiling expression choices to less explored choices (e.g., non-T7 promoters and lesser used refactoring strategies) and demonstrating their synergistic effects in CFE is more valuable when there are correlations between CFE and in vivo 57,69,70 .To determine whether the refactoring strategies we explored are correlative to in vivo expression, we transformed each of our refactored constructs into BL21Star (DE3) cells.First, OD 600 of the codon-optimized construct of each promoter was measured to determine inducing time.Optimal OD 600 for each promoter/operator combination was based on literature precedent for standard ODs of induction for each promoter respectively (OD 600 = ~ 0.8 for T7-lac, OD 600 = ~ 0.2 for pBAD, OD 600 = ~ 0.6 for pTet) [71][72][73] .In a 96-well plate, from the initial culture with OD 600 = 0.05, pT7 constructs take 3.5 h to reach OD 600 = 0.8, pTet constructs take 3 h to reach OD 600 = 0.6, and pBAD constructs take 70 min to reach OD 600 = 0.2.Next, we varied the inducer concentrations for each promoter (Fig. S8).Of the conditions we evaluated, we found that inducer concentrations are different with CFE reactions: IPTG concentration reaches greatest expression at 500 µM, anhydrotetracycline at 1000 nM, and l-arabinose at 5 mM.These conditions correlate with previous reported for RFP, thus, they were used for sfGFP expression 28 .
When comparing the effect of codon optimization/harmonization on RppA expression, without considering promoter choice, trends only correlate between the in vivo and in vitro experiments in the pT7 and the constitutive promoter data (Fig. 5A,D).When expressing with pTet, the HC-rppA outperforms HR-rppA in vitro while these two harmonized sequences perform similarly in vivo.Differences in flaviolin synthesis from varying coding sequences under pBAD expression are indistinguishable in vivo (Fig. 5C), and generally lower compared to production in the cell-free system (Fig. 4C).Efficient catabolism of l-arabinose by E. coli cells is a drawback of arabinose-inducible promoters, which may be a potential cause for better flaviolin synthesis in pBAD-regulated CFE 26 .Notably, the current CFE system cannot be used to prototype promoter choice for the in vivo expression of RppA, given any coding sequence (Figs. 4, 5).The pT7 expression measurements were collected when the inducible IPTG was added, which was after 3.5 h of growth (Fig. 5A).While the measurement of the pJ23101 constructs were collected right after inoculation into the 96-well plate.It takes about 3-4 h for the cells to grow before they start to produce pigment.This explains the delay occurring in the pJ23101 promoter (Fig. 5D).The level of protein expressed correlates with the amount of flaviolin detected and can be visualized by western blot analysis of pooled lysate reactions (Figs.S10, S11).This is also true for sfGFP expression.While sfGFP expressed highest using pT7 in both systems, the pBAD-sfGFP expressed better in CFE while the constitutive promoter and pTet promoter are expressed better in vivo (Figs.S7D&4E).From this set of experiments, this suggests that there is strong correlation, but not complete alignment between expressing in cell free vs. in vivo.

Conclusion
Natural product synthesis in non-native contexts requires the successful translation and folding of biosynthetic genes.Refactoring choices to improve the heterologous expression and activity of these enzymes include the use of inducible or constitutive promoters and codon optimization/harmonization strategies.These elements are commonly explored for in vivo protein synthesis purposes, but long Design-Build-Test-Learn (DBTL) cycles associated with cellular engineering can be a limiter when evaluating gene refactoring strategies.Cell-free systems enable the accelerated testing of such tactics and thus enable the rapid optimization of refactoring choices for in vitro or in vivo expression.However, besides pT7 promoter systems and a selection of constitutive promoters 74 , other promoter systems have not been used extensively in CFE.Tools for codon harmonization, as opposed to codon optimization, have also not been considered for CFE.Thus, we aimed to explore whether inducible promoter systems and codon harmonization can benefit in vitro protein synthesis or be prototyped in CFE reactions for in vivo implementation.
In the cell-free context, we show that inducible pTet-and pBAD-regulated expression, while slower than pT7, allow higher yields of flaviolin.The same is true for constitutive expression as codon harmonization algorithms are likely to more accurately measure the "tempo" of translation elongation 43,75 .We demonstrate that even in a gene with mostly efficient codons (74.5% rank 1, 21.7% rank 2; avg rank 1.33 based on ROC-SEMPPR), codon harmonization improves natural product synthesis more than uniform codon optimization across the cell-free and cellular environments considered here (7/8 cases).This is consistent with Keasling and coworkers recent report that in some, but not all heterologous hosts, codon harmonization can be superior to other codon optimization methods to express a type I polyketide synthase gene from an actinomycete 21 , providing further support to the notion that codon harmonization should be explored more generally to promote improved protein production from biosynthetic genes from Actinobacteria.Interestingly, we found that different harmonization methods do not work equally well for rppA.Consistent with the original protein coding sequence having few "slow" codons, which probably affect co-translational folding the most 34 , in 6 out of the 8 cases evolutionarily based harmonization, i.e., fitting the evolutionarily based ROC-SEMPPR to the donor and host genomes to determine and replace based on individual codon ranks, performed substantially better than the window-based CHARMING approach (in one of the two other cases, the differences were almost indistinguishable).We also found that the choice of promoters influences the outcome of refactoring coding sequences.These interactions also vary between in vitro and in vivo reactions, particularly for non-pT7 inducible promoters, for which the relative activities of promoters and the synergies between promoter usage and coding sequences poorly correlate between cell-free and cell-based systems.In conclusion, refactoring promoters and/or coding sequences via CFE can be a valuable strategy to rapidly screen for catalytically functional production of enzymes from BCGs.This can in turn accelerate DBTL cycles to generate valuable metabolites.

In vivo Flaviolin measurement
E. coli BL21 Star(DE3) was used as a host strain for in vivo expression of RppA.Cultures were grown in 2xYPTG media (10 g/L yeast extract, 7 g/L potassium phosphate dibasic, 3 g/L potassium phosphate monobasic, 5 g/L NaCl, 16 g/L tryptone, and 18 g/L glucose) supplemented with kanamycin at 50 µg/mL.Overnight seed cultures (2 mL) were grown from a fresh single colony at 37 °C, shaking at 210 rpm.In a 96-well plate (Greiner), all constructs started growing with the initial OD 600 = 0.005.5000 µM L-arabinose, 100 nM anhydrotetracycline, and 500 µM IPTG were induced after 110 min, 180 min, and 210 min, respectively.The plate was covered with an adhesive plate seal (Thermo Scientific) and loaded measured on a VARIOSKAN LUX (Thermo Scientific) plate reader., however suggested a level of purity that was only appropriate for assessing relative concentration rather than absolute concentration.

Cell-free extract preparation
The same extract preparation procedure was used for all strains.A seed culture was prepared with 30 mL 2xYPTG media (10 g/L yeast extract, 7 g/L potassium phosphate dibasic, 3 g/L potassium phosphate monobasic, 5 g/L NaCl, 16 g/L tryptone, and 18 g/L glucose) inoculated with a fresh colony and incubated overnight at 37 °C, 220 rpm. 1 L of 2xYPTG media in a 2.5 L Tunair flask was then inoculated with the overnight culture and grown at 37 °C, 220 rpm.Cell growth was monitored by NanoDrop (Thermo Scientific).Cells were harvested at OD 600 ~ 2.8-3.2 by centrifugation (5000 × g, 15 min, 10 °C), then washed three times using S30 buffer (10 mM Tris-acetate, 14 mM magnesium acetate, 60 mM potassium acetate, and 10 mM DTT).All wash steps were performed at 4 °C.Cell pellets were then weighed, flash frozen, and then stored at − 80 °C.For extract preparation, the cell pellets were then thawed on ice and resuspended in 0.8 mL of S30 buffer per g of cell pellet of the pellet by vortexing with short bursts (vortex 15 s, rest 30 s, repeat).1.4 mL aliquots were sonicated on ice in 2 mL microcentrifuge tubes using an OMNI Sonic Rupto 400 (45 s on, 59 s off for three cycles, 50% amplitude set).4.5 µL of 1 M DTT was added into each tube immediately after sonication.All samples were centrifuged at 12,000 × g for 10 min at 4 °C.The supernatant was collected without disturbing the pellet and centrifuged again to remove the remaining debris.The resulting supernatants were aliquoted into fresh centrifuge tubes, flash-frozen, and stored at -80 °C.

CFE reaction preparation
The cell-free reaction comprised 1.The type of inducer used and changes to any of these conditions are described in the text.All reactions were incubated in a 96 PCR well plate (VWR #47744-116).Surrounding wells were filled with 1 × phosphate-buffered saline (PBS) to control the humidity and prevent evaporation.Plates were covered with an adhesive plate seal (Thermo Scientific), before putting it in the plate reader.Flaviolin synthesis was monitored by reading reaction absorbance at 340 nm at varying timeframes and intervals, as described in the text.

Flaviolin quantitation in lysates with absorbance measurements
To generate standard curves from pigment absorbance measurements, increasing concentrations of the purified pigment dissolved in DMSO were spiked into BL21 Star(DE3) lysate mock reactions (i.e., reactions without DNA).Absorbance measurements were made in a 96 PCR well plate, without a lid, loaded into a VARIOSKAN LUX (Thermo Scientific) plate reader.The read protocol was set to shake the plate at high speed for 2 s then measure absorbance in selected wells at 340 nm.The resulting values were then normalized to the 0 µM pigment condition.
To measure the absorbance of flaviolin produced by cell-free expressed RppA, base reaction mixes with BL21 Star(DE3) lysate and RppA-expressing plasmid DNA were performed.Modifications to the reactions for validating pigment production are described in the text.All reactions were laid out on a 96-PCR well plate and measured every 10 s for 20 HR at 30 °C.A 340 measurements were taken and normalized as described above.

Quantification of active sfGFP
Fluorescence measurements of reactions expressing sfGFP were taken using top optics on a VARIOSKAN LUX (Thermo Scientific).Excitation and emission filters were set to 485 nm and 538 nm, respectively.

SDS-PAGE and Western Blot analysis
In vivo expression of RppA was conducted using E. coli BL21 Star(DE3) as a host strain harboring plasmid DNA of four RppA coding sequences in pJ23101 promoter.Cultures were grown in LB broth (Miller) supplemented with appropriate antibiotics (kanamycin at 50 µg/mL).Overnight seed cultures were grown in 25 mL LB broth and kanamycin (50 µg/mL) inoculated with a single colony at 37 °C, shaking at 210 rev/min.50 mL of expression cultures were inoculated from these cultures in a ratio of 1:100 and incubated at 37 °C shaking at 210 rev/ min until an OD 600 of 0.6-0.8 was reached.The temperature was then lowered to 16 °C and incubated for ~ 20 h at 16 °C prior to harvest by centrifugation (4000 × g, 15 min at 4 °C).Cell pellets were resuspended in a wash buffer (100 mM Tris/HCl pH 8.0, 150 mM NaCl, 1 mM EDTA) and lysed by sonication (3 × 30 min on, 1 min off).After sonication, the lysate was clarified via centrifugation (9000 × g, 30 min, 4 °C).
5 µL of RppA lysate which was taken from the combination of 3 triplicate lysates, was denatured with 5 µL 2 × Laemmli sample buffer (BioRad #1610737).After boiling at 98 °C for 10 min, 5 µL of the denatured protein was loaded into a pre-cast 4-20% gel ordered from ThermoFisher (XP04205BOX).The gel was run for 10 min at 80 V followed by 60 min at 220 V.
For protein expressed cell-free, triplicate reactions were pooled into a microcentrifuge tube after reaction time (12 h for pT7 promoter, 38 h for pTet promoter and pBAD promoter, and 20 h for pJ23101 promoter).5 µL from the reaction was added to 5 µL 2 × Laemmli sample buffer (BioRad #1610737) in a PCR tube.After boiling at 98 °C for 10 min, 8 µL of the denatured protein was loaded into a pre-cast 4-20% gel ordered from ThermoFisher (XP04205BOX).The gel was run for 10 min at 80 V followed by 60 min at 220 V.
For Western Blot analysis, a ThermoFisher Mini gel tank and Mini Blot Module were used to transfer bands onto a nitrocellulose membrane (20 V-60 min).The membrane was blocked using 20 mL PBS-blocking buffer (PBS buffer with 3% BSA and 0.5% v/v Tween 20) for 1 h.The membrane was then washed 3 times with 20 mL PBS-Tween buffer(PBS buffer with 0.1% v/v Tween 20) for 5 min, room temperature, and gentle shake.Next, the membrane was incubate for 10 min in 10 mL PBS-Tween buffer with 10 µL Biotin Blocking Buffer (iba 2-0205-050) before 60 min incubation with the addition of 2.5 µL Strep-Tactin horse radish peroxidase conjugate (Bio-Rad 161381).The membrane was washed twice with PBS-Tween buffer for 1 min and then washed with PBS buffer for 1 min before transferred in 20 mL PBS buffer with 200 µL chloronaphtol solution and 20 µL H 2 O 2 solution.The chromogenic reaction was observed after ~ 10 min.

Figure 2 .
Figure 2. Design and nomenclature of plasmids in this study.(A) Each plasmid tested is composed of two modules: a backbone containing different promoters: pT7, pTet, pBAD and different coding sequences of the gene of interest (rppA): N-rppA, O-rppA, HC-rppA, and HR-rppA.(B) Names of the BioBrick vectors (pBb) carrying different promoters 28 .(C) Names of the varied coding sequences.

Figure 3 .
Figure 3. Flaviolin production via RppA in CFE reactions.(A) Schematic of flaviolin biosynthetic pathway by type III PKS, RppA.(B) CFE reactions were initiated with increasing concentrations of codon optimized rppA (O-rppA) driven by pT7 promoter in plasmid DNA, resulting in increasing flaviolin production (observed by increase of absorbance at 340 nm).

Figure 4 .
Figure 4. Flaviolin measurement in CFE reactions initiated with plasmids carrying different promoter and coding sequence combinations.Error bars represent the standard error of the mean (n = 3).(A) CFE reactions containing one of four pT7 promoter constructs, 50 ng/µL T7 RNA polymerase, and 500 µM IPTG.Reactions were run in triplicate and read every 10 min for 12 h.(B) CFE reactions containing one of four pTet promoter constructs, 500 µM malonyl-CoA, and 50 µM tetracycline.Reactions were run in triplicate and read every 20 min for 38 h.(C) CFE reaction containing four pBAD promoter constructs plasmid DNA, 500 µM malonyl-CoA, and 10 mM l-arabinose.Reactions were run in triplicate and read every 20 min for 38 h.(D) CFE reaction containing four pJ23101 promoter constructs plasmid DNA and 500 µM malonyl-CoA.Reactions were run in triplicate and read every 10 min for 20 h.All reactions were prepared with E. coli BL21 Star(DE3) lysates.Error bars represent the standard error of the mean (n = 3).

Figure 5 .
Figure 5. Flaviolin formation in E. coli BL21 (Star)DE3 cells carrying different promoter and coding sequence combinations.Flaviolin production in cells carrying (A) pT7 promoter constructs post-induction with 500 µM IPTG (added after 3.5 h of growth), (B) pTet promoter constructs post-induction with100 nM anhydrotetracycline (added after 3 h of growth), (C) pBAD constructs post-induction with 5000 µM l-arabinose (added after 1 h and 10 min), and (D) pJ23101 constructs.(E) Expression of BL21 Star(DE3) cells harboring sfGFP.Control (CT) is BL21 Star(DE3) without plasmid DNA.Reactions were run in triplicate and read every 10 min for 20 h.Error bars represent the standard error of the mean (n = 3).
15mL seed culture of BL21 Star(DE3) harboring the optimized rppA driven by pT7 plasmid was grown overnight (37 °C, 2010 rev/min) in LB medium supplemented with 50 µg/mL kanamycin.After ~ 20 h, 10 mL of seed culture was used to inoculate 1 L media in a Fernbach flask (VWR 29171-854).Cells were incubated at 37 °C shaking at 210 rev/min.At an OD 600 of ~ 0.8, the culture was induced with 0.5 mM IPTG and grown at 16 °C for 20 h.The culture was then centrifuged at 5000 × g for 30 min.The pink supernatant was adjusted to pH 2 with 3 M HCl and incubated at 4 °C overnight to precipitate flaviolin.Pigments were recovered by centrifugation at 5000 × g for 30 min, and the precipitate was washed with DI water.The pellet was then dried at 50 °C in an oven overnight.The dried pellets were washed with 6 M HCl at 100 °C to remove proteins and carbohydrates, then centrifuged at 5000 × g for 10 min.The precipitate was washed with ethanol and chloroform and then dried at 50 °C overnight.Identity was confirmed via Direct Analysis in Real Time Mass Spectrometry on Dart-AccuTOF mass spectrometer and 1 H NMR on a Varian Mercury 500 mHz spectrophotomer.HRMS-DART [M + H] + calculated for C 10 H 6 O 5 : 206.02000; found: 206.16855.1HNMR had peaks consistent agreement with literature report A